Xu ef al. Harm Reduction Journal 2014, 1 1:9 
http://www.harmreductionjournal.eom/content/1 1/1/9 




HARM REDUCTION 
JOURNAL 



RESEARCH Open Access 



Estimating the number of injection drug users in 
greater Victoria, Canada using capture-recapture 
methods 

Yuan Xu 1 , Murray Fyfe 2 , Liz Walker 2 and Laura LE Cowen 1 



Abstract 

Background: Population size estimation is critical for planning public health programmes for injection drug users. 
Estimation is difficult, as these populations are considered 'hidden' or 'hard to reach'. The currently accepted 
population size estimate for greater Victoria, Canada is between 1,500 and 2,000 individuals, which is dated prior to 
the year 2000, and is likely an underestimate. 

Methods: We used three mark-recapture methods (the Lincoln-Petersen estimator, Huggins' model, and Pledger's 
model) to estimate population size using cross-sectional survey data collected in 2003 and 2005. Data come from a 
closed population with two time-ordered samples from the same source. We compare our estimates with the 
currently accepted estimate that is based on the registry of a Victoria needle exchange. 
Results: All methods provided population size estimates that were higher than the currently accepted estimate. 
Huggins 1 method produced wider confidence intervals. Point estimates of population size from the three methods 
ranged from 3,329 to 3,342. 

Conclusions: Our estimates will aid health authorities in planning for harm reduction programmes. Repeating the 
methods as further phases of l-Track data become available will ensure that the population estimates remain up to 
date. 
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Background 

Prevalence of HIV and hepatitis C in many injection 
drug user (IDU) populations is higher than in the general 
population; the same can be said for the injection drug 
user population of greater Victoria, British Columbia, 
Canada (City of Victoria and the 12 other members of the 
Capital Regional District) where both open and hidden 
use are known to occur [1]. IDUs are faced with many 
other challenges to their well-being, and public health 
authorities are charged with the duty of providing various 
harm prevention services from basic health care, addic- 
tions treatment, and counselling, to harm reduction edu- 
cation. Knowledge of the number of injection drug users 
within a population would aid both health authorities and 
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community organisations in assessing coverage of existing 
programmes and in the planning and delivery of a range 
of public health services. 

AIDS Vancouver Island (AVI)'s needle exchange 
programme was established in 1988, providing clean 
syringes for IDU residents of Victoria and surrounding 
areas including the Gulf Islands. The client load of 
AVI's needle exchange programme [2] was used to pro- 
duce the only estimate available for the number of 
IDUs in the Capital Health Region. This estimate pub- 
lished in 2000 was 1,500-2,000 individuals [1]; how- 
ever, there are no specific details on how this estimate 
was determined. In 2008, the fixed-site needle ex- 
change location in Victoria was closed, and needle 
exchange services are now provided on a mobile basis. 
Other agencies have also started offering clean supplies 
to IDU clients in Victoria since the client load estimate 
was generated. It is therefore unknown how reliable 
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the use of the current needle exchange programme 
registry is for assessing the size of the IDU population 
in greater Victoria. An accurate estimate of IDUs is 
vital to the planning of health services for this 
population. 

To track changes in the prevalence of HIV and hepa- 
titis C as well as risk behaviours, the Public Health 
Agency of Canada in collaboration with regional health 
authorities developed the national, cross-sectional I- 
Track survey [3]. Phase I and phase II of the I-Track 
survey were completed in Victoria in 2003 and 2005, 
respectively. With only two samples from the I-Track 
survey (phase I and phase II), a closed population model 
must be implemented, as three or more samples are 
required to implement open population models. We use 
three closed population mark-recapture models to esti- 
mate the number of IDUs in greater Victoria, BC and 
compare the estimates obtained. 

Mark-recapture models 

Mark-recapture or capture-recapture models come from 
the desire to estimate demographic parameters of wild- 
life populations. Their use in epidemiology is most 
prevalent from a multi-list standpoint where data from 
several sources are combined to serve as samples from a 
population of interest [4]. Multiple data lists are typically 
collected over the same time frame but from different 
sources. For example, Hickman et al. studied injection 
drug use in Brighton, Liverpool, and London from five 
sources, namely arrest referrals, drug treatment reports, 
syringe exchange programmes, accident and emergency 
records, and a community recruitment survey [5]. For 
two-data source studies, samples may be dependent and 
there is no means to test for independence unless three 
data sources are obtained. This is the advantage of time- 
ordered samples — one can model dependence through 
the behaviour of the injection drug users (see discussion 
on trap-happy or trap-shy behaviour). Hook and Regal 
provide an overview of the use of mark-recapture multi- 
list methods [6]. In multi-list studies, there is no natural 
time ordering to the lists; thus, not all wildlife estimation 
techniques are valid [7]. It is less common to see epi- 
demiological studies that sample the population over 
time, likely due to the logistics and resources required 
for such an undertaking (see [8,9] for examples). How- 
ever, if done, the time ordering of samples offers an op- 
portunity to use different estimation procedures than in 
multi-list studies. 

In wildlife studies, individuals are captured, marked 
with a unique identifier, and returned to mix back into 
the population. In subsequent samples, marked individ- 
uals are identified (recaptured) and unmarked individ- 
uals are given marks before release. Thus, an animal's 
capture history is recorded and is represented by a 



sequence of O's (not captured) and l's (captured) for each 
sample occasion. For example, in a two-sample study, an 
animal with a history of {11} was caught at time 1, tagged, 
and released back into the population and was recaptured 
at time 2. In contrast, an animal with a capture history of 
{10} was caught at time 1, tagged and released, and was 
not seen again. 

In studies of human populations, individuals are con- 
tacted (captured), and unique identifiers are obtained 
(marks). Here unique identifiers could be some combin- 
ation of a person's date of birth, initials, age, etc. In sub- 
sequent samples, individuals are again contacted, and 
unique identifiers are obtained. Individuals whose identi- 
fiers match those from the first sample are considered to 
be re-sampled (recaptured). Once more, a capture his- 
tory is developed for each individual in the study. For 
example, in a two-sample study, an individual with a 
capture history of {11} was contacted in the first sample, 
marks were obtained, and the individual was contacted 
in the second sample. An individual with a capture his- 
tory of {01} was only contacted in the second sample. 
For the purposes of this paper, the mark-recapture ter- 
minology used in wildlife models will be used to refer to 
human populations. 

The three estimators we implemented were the 
Lincoln- Petersen estimator [10], a conditional likelihood 
estimator [11,12], and a maximum likelihood estimator 
with finite mixtures [13]. The Lincoln-Petersen (LP) esti- 
mator (see [10]) is widely used in epidemiological two- 
sample studies (for example, see [8,9]). Its limitations are 
largely due to model assumptions, which are similar for 
the other methods that we explored and are as follows: 

1. The population is closed (no births or deaths, 
immigrations or emigrations). 

2. The probability of capture is the same for each 
individual in the population within a sample. 

3. Samples are independent. 

4. Marks are not lost. 

The other two estimators share these assumptions but 
provide methods to relax assumption 2. 

Assumption 2 leads to the assumption that samples 
are independent. Chao describes causes for dependent 
samples, which include behavioural responses (e.g. trap- 
happy or trap-shy — see discussion) and heterogeneity in 
capture probabilities [7]. Incorporation of dependence 
among samples can be done by relaxing assumption 2 
[7], implementing methods reviewed by Otis et al. [14]. 

The LP estimator violates assumption 2 when behaviour 
and/or heterogeneity affects the probability of capture. 
One method of dealing with heterogeneity in the data is 
to incorporate covariates into the estimation procedure. 
To do so, Huggins introduced a conditional likelihood 
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procedure where capture probabilities can vary according 
to age, sex, or other factors [11]. Because the covariates 
for uncaptured individuals are unknown, Huggins con- 
structed a likelihood conditional on the captured individ- 
uals so that characteristics of uncaptured individuals are 
not required [11]. Huggins' method also allows capture 
probabilities to depend on an individual's prior capture 
history [11]. The population size is then estimated indir- 
ectly using the capture probability estimates. 

Another estimation procedure that models capture 
probabilities dependent on time, behaviour, and/or het- 
erogeneity was proposed by Pledger, introducing finite 
mixture models to partition the individuals into two or 
more groups with relatively homogeneous capture prob- 
abilities [13]. Pledger's method relaxes assumption 2 but 
does not condition on captured individuals [13]. Rather, 
the likelihood models both captured and non-captured 
individuals, allowing the size of the population (AO to be 
a parameter that is estimated directly. Xu and Cowen 
detail these three methods [15]. 

Methods 

l-track survey 

The I-Track survey in Victoria is thoroughly described 
elsewhere [3]. Briefly, consenting participants were re- 
cruited in the downtown core of Victoria through a needle 
exchange programme run by AVI and at shelter services 
run by the Victoria Cool Aid Society. Other recruitment 
attempts were done using posters, flyers, word of mouth, 
and through contact with Vancouver Island Health Au- 
thority staff. Participants were not required to have a 
residence in Victoria or to have resided in Victoria for any 
specific period of time. Monetary compensation ($20.00) 
was provided for answering a questionnaire and providing 
a blood spot sample. Demographic and risk behaviour sta- 
tistics resulting from these surveys are reported elsewhere 
[3]. Phase I completed in November 2003 had 254 partici- 
pants, while phase II completed in June 2005 had 250 
participants. 

Eligibility criteria included being at least 15 years of 
age, being capable of informed consent, having an un- 
derstanding of English or French, having injected non- 
therapeutic drugs in the past 6 months, and participa- 
tion only once per phase. Parental consent was not 
needed, as it is possible to have mature minor consent 
in British Columbia. 

Survey participants were asked to provide their initials, 
gender, and birth date (no proof of identification was re- 
quired for I-Track participation). A computer encryption 
program used these inputs to create a unique identifier 
that would be replicated if the same data were entered 
again in a future phase of the study. This allowed the 
subjects to be linked between different study phases and 
preserved anonymity. This identifier (analogous to a unique 



tag in a wildlife study) is the tool that allows for a mark- 
recapture study, resulting in the estimation of the number 
of injection drug users in greater Victoria, BC. 

To establish that respondents were injection drug 
users, subjects were recruited only after an exchange of 
needles had taken place at the needle exchange. In other 
locations, screening questions were used (e.g. Where on 
your body do you inject? Where do you get your rigs? 
What size needle do you use? When did you last inject?). 
If during the interview the subjects' responses suggested 
a lack of familiarity with terms, their eligibility would be 
questioned. 

Statistical analysis 

We discuss the details of the statistical models in the 
Appendix. We implemented models in Program MARK 
[16]. Model selection was done by forming a set of 
plausible models and using Akaike's information criter- 
ion corrected for small sample sizes (AICc) to choose a 
model from among this candidate set [17]. Goodness of 
fit for closed population models has not yet been resolved 
[18] (see the Appendix for a discussion). However, we 
did compare observed with expected counts of each 
capture history in the form of Pearson chi-square residuals 
(i.e., X 2 = (observed - expected) 2 / expected) [19]. 

Results 

Table 1 provides basic demographic characteristics of 
the two phases of I-Track data. 

A thorough discussion of the model selection process 
for each method is discussed in Xu and Cowen [15]. 
Briefly, we examined the eight standard closed popula- 
tion models outlined by Otis et al. [14]. These models 
allow capture probabilities to vary by time, behaviour, 
and/or heterogeneity. The Lincoln-Petersen estimator is 
the model where capture probabilities vary by sample 
time. Of the 254 individuals sampled in phase I and 250 
individuals sampled in phase II, there were 19 individ- 
uals in both samples. The population size estimate is 
3,329 individuals using the Lincoln-Petersen estimator 
(Table 2). For Huggins' method, both the individual's sex 



Table 1 Demographics of phases I and II of the I-Track 
survey 





Phase 1 2003 


Phase II 2005 


Percent male 


73.5 


76.0 


Percent with high school 


48.4 


50.0 


education or greater 






Percent aboriginal 


20.6 


20.9 


Average age (years) 


34.6 


38.8 


Average age of first injection (years) 


23.0 


22.8 


HIV prevalence (%) 


15.4 


12.5 


HCV prevalence (%) 


68.5 


73.8 
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Table 2 The estimated number of injection drug users in 
greater Victoria, BC 



Model 


W 


SE 


95% confidence interval 


LP 


3,329 


706 


(2,246, 5,078) 


H 


3,342 


709 


(2,254, 5,098) 


P 


3,330 


706 


(2,246, 5,078) 



N, estimated population size; SE, estimated standard error; LP, Lincoln- 
Petersen model; H, Huggins' conditional likelihood model; P, Pledger's 
mixture model. 



and previous capture history were used as covariates for 
modelling the heterogeneity in capture probability. We 
also examined models to see if there was additional 
group heterogeneity. However, AICc chose the model 
with constant capture probabilities and no group hetero- 
geneity. Similar results occurred with Pledger's method; 
there were no time, behaviour, nor heterogeneity re- 
sponses in the capture probabilities. The model with 
constant capture probabilities had the lowest AICc value. 

Table 2 compares the estimation results from all three 
estimation methods. In terms of the point estimate for 
population size and the confidence intervals, all three 
methods produced similar results; however, none of the 
confidence intervals contains the upper bound estimate 
of '2,000' provided by Stajduhar et al. [1]. Further, the 
estimated standard errors for all methods were also 
similar. 

Pearson chi-square residuals for the model with con- 
stant capture probability (Pledger's model) are provided 
in Table 3. Based on these results, we find no evidence 
for outliers or concerns with fit of the model. Similar 
results were seen for residuals of Huggins' model and 
the LP estimator. 

Discussion 

There was some concern that the number of recaptures 
in our study was lower than expected, resulting in popu- 
lation estimates that were higher than the currently ac- 
cepted estimate of 1,500-2,000 individuals. The I-Track 
survey aimed to recruit from a broad spectrum of user 
groups. Forty percent of the I-Track participants were 
recruited at locations other than the needle exchange. 
An IDU population estimate based on the needle ex- 
change programme registry prior to the year 2000 (the 
1,500-2,000 estimate) would miss people who were not 



Table 3 Observed count, expected count, and Pearson 
chi-square residual for the model with constant capture 
probability 



Capture history 


Observed 


Expected 


Residual 


11 


19 


19.06 


0.00 


10 


235 


232.91 


0.02 


01 


231 


232.91 


0.02 



clients of the needle exchange programme and is there- 
fore likely an underestimate. Our estimate represents 
approximately 0.9% of the greater Victoria population, 
whereas the proportion of IDUs is approximately 0.2%- 
0.9% nationwide [20,21]. However, Victoria has a com- 
parably mild climate that may attract street-involved 
people from other areas. We therefore argue that it is 
reasonable for our estimate to be at the upper end of na- 
tional estimates. Because the national estimate is based 
on a population survey that covers both urban and rural 
locations, it is not directly comparable to our estimate. 

A lowered recapture rate could also be the result of a 
'trap-shy' response where individuals from the first sur- 
vey avoid being captured in the second survey. For the 
Lincoln-Petersen estimate, this would have resulted in 
an overestimate of population size [7]. However, as be- 
haviour was modelled in Huggins' and Pledger's models, 
we would have seen a reduction in the population esti- 
mate; this was not the case. 

To explore this issue further, we varied the number of 
recaptures in the Lincoln-Petersen estimator to see how 
this affected the population size estimate (Figure 1). To 
get a Lincoln-Petersen estimate of around 2,000 individ- 
uals, the number of recaptures would have to be at least 
32 individuals. Similarly, having 43 recaptures would 
produce an estimate of around 1,500 individuals. 

The closure assumption is likely violated for the I- 
Track data. Deaths could have occurred between the 
two I-Track surveys, people could have moved into or 
out of the region, and initiation or cessation of injection 
could have occurred between samples. To look at the 
stability of injection, we define the average number of 
years of injection as average age minus average age of 
first injection (Table 1); we find this to be 11.6 and 
16.0 years for phases I and II, respectively. The stability 
of the client groups associated with recruitment sites is 
unknown and may have had some impact on the closure 
assumption. 

Violation of the closure assumption can result in 
biased estimates, which increases with increased mobil- 
ity into and out of the population [22]. Kendall studied 
the effect of closure violations on closed population 
models from the viewpoint of individuals in the popula- 
tion being a subset of a superpopulation [23]. For situa- 
tions where individuals are able to move randomly in 
and out of the study area throughout the study, Kendall 
considered each of the survey samples to be random 
samples from a superpopulation of size N° [23]. Individ- 
uals in the study area are drawn from the superpopula- 
tion with probability Ti and captured with probability py 
on occasion /'. The closed population estimators are 
biased for the group of individuals in the study area on 
occasion /', but unbiased for the superpopulation. Argu- 
ably, the superpopulation is of more interest than the 
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Figure 1 Estimated population size using the Lincoln-Petersen estimator varying the number of recaptures. Error bars represent the 
point estimate ±two estimated standard errors. 



number of individuals in the study area at a particular 
occasion. The superpopulation for our study would be 
all individuals that entered the study area between 2003 
and 2005. 

The assumption of homogeneity of capture probabil- 
ities is rarely met in epidemiological studies [24]. This 
can be affected by the behaviour of an individual. For 
example, in animal studies, an animal that enjoyed the 
experience of being caught can become 'trap-happy'. 
Similarly, if an IDU enjoyed the experience of the first I- 
Track study or was positively impacted by the $20.00 re- 
muneration, the person might have looked for opportun- 
ities to participate in the second. On the other hand, if 
an individual did not have a good experience with the 
first I-Track survey, the person might avoid the second 
survey ('trap-shy'). Further, different individuals could 
have intrinsically different capture probabilities, causing 
heterogeneity. Otis et al. specified models that incorpo- 
rated potential sources of variation by modelling capture 
probabilities as dependent on time, behaviour, and/or 
heterogeneity [14]. All of the models we used were based 
on Otis et al.'s work [14]. The conditional model ap- 
proach modelled capture probabilities dependent on the 
sex of the individual, thereby having the potential to fur- 
ther reduce heterogeneity. These models cannot account 
for individuals who have a null probability of being cap- 
tured. If such individuals exist in the population, then 
our estimates would be considered conservative. 



As mentioned, the assumption of independent samples 
can be relaxed and modelled through incorporation of 
behavioural effects or heterogeneity in capture probabil- 
ities. This assumption was not likely violated as models 
that included behaviour or heterogeneity effects were 
not selected. 

As no formal identification is required to participate in 
the study, it is possible for unique identifiers to change 
from one survey to the next, violating the assumption of 
no tag loss. This could happen if an individual forgot the 
information that results in their unique identifier or if 
an individual's unique identifier changed between survey 
phases due to unusual cases such as a name being chan- 
ged (due to marriage for example). If a subject provided 
different identifiers, it would not be possible to link 
them. We argue that this would be rare and would result 
in a reduced number of recaptured individuals produ- 
cing overestimates of population size (see Figure 1). 

When the data from the latest I-Track survey are avail- 
able, we would like to use an open population Jolly-Seber 
model to remove this assumption altogether in future 
work [25,26]. 

As our estimate is quickly becoming a decade old, fur- 
ther estimates to determine if the population size remained 
the same over the last 10 years would be beneficial. Mov- 
ing into an open population framework with more data 
would also allow us to assess whether the population size 
has changed over time. Once established, application of 
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this model to future phases of the data would be relatively 
straightforward. 

Conclusions 

For the Vancouver Island Health Authority, our popula- 
tion estimates will be helpful in the planning of services to 
meet the health care needs of the IDU population. When 
harm reduction programmes such as fixed-site needle ex- 
changes are implemented to help control the transmission 
of HIV and hepatitis C, knowing the number of potential 
clients will aid in programme development. 

Local experience in Victoria has demonstrated that 
when services are insufficient to meet demand, higher 
risk drug use practices may take place, including needle 
sharing. These higher risk practices may result in threats 
to health such as blood-borne pathogen infections, ab- 
scesses, and overdoses. 

Improved estimates of the population size will assist in 
securing resources required to meet service demands and 
planning the mix of services that may best meet these 
needs. This could include adjustments to number and 
types of locations providing harm reduction services, 
hours of operation, and numbers of staff. Improved esti- 
mates will also better enable an assessment of the impact 
of programmes and policies for this population. 

Appendix 

Statistical model details 

As the I-Track data have a natural time ordering, we 
were not limited to multi-list models. We were able to 
relax the constant probability assumption using Huggins 1 
model with capture probabilities dependent on the co- 
variate sex [10]. For our model, the capture probabilities 
were modelled using a linear logistic formulation as 

log ( i -V- ) = ^° + ^ iSm + ^ Zii 

where ptj denotes the probability that individual i is cap- 
tured at occasion /, sex, is an indicator variable for the 
sex of individual i, and Zy is equal to 1 if individual i was 
captured before occasion / and 0 otherwise. Thus, covar- 
iates for sex and previous capture history (behaviour of 
the individual) were introduced into the model. 

Using Pledger's method, capture probabilities were 
modelled dependent on time, behaviour, and/or hetero- 
geneity. The capture probabilities were modelled with a 
linear logistic formulation as 

lo g(r^i^) = ? + + & + wo* 

where 6jb a is the probability of capture for individual i at 
occasion / with behaviour b in group a; b = by is equal to 



1 if individual i was not caught before occasion j and 2 
otherwise; z) is the effect of time for occasion jit is the 
effect of behaviour for an individual with behaviour b; fj a 
is the effect of heterogeneity for an individual in group 
a=\, %..., A with probability Tt\, ~n%,..., n A ; and fi is a 
constant unknown parameter. 

Goodness of fit 

Goodness of fit in closed population models is problem- 
atic and is still a current statistical issue [18]. One of the 
main problems is that when heterogeneity is considered 
in the capture probabilities, there is an infinite number 
of saturated models due to the fact that individuals that 
are not captured cannot have their covariates measured; 
in other words, the saturated model is not uniquely 
determined due to the missing covariates. This is a 
problem for a formal goodness-of-fit test based on the 
deviance, which requires a uniquely specified saturated 
model. A goodness-of-fit test based on the conditional 
distribution of the observed data does not suffer from 
this problem. However, Link pointed out that very differ- 
ent capture probability models can give rise to an identical 
conditional distribution [27], rendering any goodness-of- 
fit test based on the conditional distribution powerless in 
distinguishing these capture probability models. 
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